In artificial intelligence, symbolic artificial intelligence (also known as classical artificial intelligence or logic-based artificial intelligence) is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, Formal logic, and search algorithm. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems (in particular, expert systems), symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to important ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.
Symbolic AI was the dominant paradigm of AI research from the mid-1950s until the mid-1990s. Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating a machine with artificial general intelligence and considered this the ultimate goal of their field. An early boom, with early successes such as the Logic Theorist and Samuel's Checkers Playing Program, led to unrealistic expectations and promises and was followed by the first AI winter as funding dried up. A second boom (1969–1986) occurred with the rise of expert systems, their promise of capturing corporate expertise, and an enthusiastic corporate embrace. That boom, and some early successes, e.g., with XCON at DEC, was followed again by later disappointment. Problems with difficulties in knowledge acquisition, maintaining large knowledge bases, and brittleness in handling out-of-domain problems arose. Another, second, AI Winter (1988–2011) followed. Subsequently, AI researchers focused on addressing underlying problems in handling uncertainty and in knowledge acquisition. Uncertainty was addressed with formal methods such as hidden Markov models, Bayesian reasoning, and statistical relational learning. Symbolic machine learning addressed the knowledge acquisition problem with contributions including Version Space, Leslie Valiant's PAC learning, Ross Quinlan's ID3 decision-tree learning, case-based learning, and inductive logic programming to learn relations.
Neural networks, a subsymbolic approach, had been pursued from early days and reemerged strongly in 2012. Early examples are Frank Rosenblatt's perceptron learning work, the backpropagation work of Rumelhart, Hinton and Williams, and work in convolutional neural networks by LeCun et al. in 1989. However, neural networks were not viewed as successful until about 2012: "Until Big Data became commonplace, the general consensus in the Al community was that the so-called neural-network approach was hopeless. Systems just didn't work that well, compared to other methods. ... A revolution came in 2012, when a number of people, including a team of researchers working with Hinton, worked out a way to use the power of GPUs to enormously increase the power of neural networks." Over the next several years, deep learning had spectacular success in handling vision, speech recognition, speech synthesis, image generation, and machine translation, though symbolic approaches continue to be useful in a few domains such as computer algebra systems and Proof assistant.
However, given the inherent complexity of intelligence itself, it remains an open question whether symbolic AI will be completely supplanted by connectionist AI, or whether symbolic AI may yet experience a resurgence. More recently, work by Zhang et al. has further argued that, at least at the theoretical level, there is no clear superiority among different technological paradigms.
An important early symbolic AI program was the Logic theorist, written by Allen Newell, Herbert Simon and Cliff Shaw in 1955–56, as it was able to prove 38 elementary theorems from Whitehead and Russell's Principia Mathematica. Newell, Simon, and Shaw later generalized this work to create a domain-independent problem solver, GPS (General Problem Solver). GPS solved problems represented with formal operators via state-space search using means-ends analysis.
During the 1960s, symbolic approaches achieved great success at simulating intelligent behavior in structured environments such as game-playing, symbolic mathematics, and theorem-proving. AI research was concentrated in four institutions in the 1960s: Carnegie Mellon University, Stanford, MIT and (later) University of Edinburgh. Each one developed its own style of research. Earlier approaches based on cybernetics or artificial neural networks were abandoned or pushed into the background.
Herbert Simon and Allen Newell studied human problem-solving skills and attempted to formalize them, and their work laid the foundations of the field of artificial intelligence, as well as cognitive science, operations research and management science. Their research team used the results of psychology experiments to develop programs that simulated the techniques that people used to solve problems. This tradition, centered at Carnegie Mellon University would eventually culminate in the development of the Soar architecture in the middle 1980s.
Edward Feigenbaum said:
to describe that high performance in a specific domain requires both general and highly domain-specific knowledge. Ed Feigenbaum and Doug Lenat called this The Knowledge Principle:
Key expert systems were:
DENDRAL is considered the first expert system that relied on knowledge-intensive problem-solving. It is described below, by Ed Feigenbaum, from a Communications of the ACM interview, Interview with Ed Feigenbaum:
The other expert systems mentioned above came after DENDRAL. MYCIN exemplifies the classic expert system architecture of a knowledge-base of rules coupled to a symbolic reasoning mechanism, including the use of certainty factors to handle uncertainty. GUIDON shows how an explicit knowledge base can be repurposed for a second application, tutoring, and is an example of an intelligent tutoring system, a particular kind of knowledge-based application. Clancey showed that it was not sufficient simply to use MYCIN's rules for instruction, but that he also needed to add rules for dialogue management and student modeling. XCON is significant because of the millions of dollars it saved DEC, which triggered the expert system boom where most all major corporations in the US had expert systems groups, to capture corporate expertise, preserve it, and automate it:
Chess expert knowledge was encoded in Deep Blue. In 1996, this allowed IBM's Deep Blue, with the help of symbolic AI, to win in a game of chess against the world champion at that time, Garry Kasparov.
Expert systems can operate in either a forward chaining – from evidence to conclusions – or backward chaining – from goals to needed data and prerequisites – manner. More advanced knowledge-based systems, such as Soar can also perform meta-level reasoning, that is reasoning about their own reasoning in terms of deciding how to solve problems and monitoring the success of problem-solving strategies.
Blackboard systems are a second kind of knowledge-based or expert system architecture. They model a community of experts incrementally contributing, where they can, to solve a problem. The problem is represented in multiple levels of abstraction or alternate views. The experts (knowledge sources) volunteer their services whenever they recognize they can contribute. Potential problem-solving actions are represented on an agenda that is updated as the problem situation changes. A controller decides how useful each contribution is, and who should make the next problem-solving action. One example, the BB1 blackboard architecture was originally inspired by studies of how humans plan to perform multiple tasks in a trip. An innovation of BB1 was to apply the same blackboard model to solving its control problem, i.e., its controller performed meta-level reasoning with knowledge sources that monitored how well a plan or the problem-solving was proceeding and could switch from one strategy to another as conditions – such as goals or times – changed. BB1 has been applied in multiple domains: construction site planning, intelligent tutoring systems, and real-time patient monitoring.
Unfortunately, the AI boom did not last and Kautz best describes the second AI winter that followed:
One statistical approach, hidden Markov models, had already been popularized in the 1980s for speech recognition work. Subsequently, in 1988, Judea Pearl popularized the use of Bayesian Networks as a sound but efficient way of handling uncertain reasoning with his publication of the book Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. and Bayesian approaches were applied successfully in expert systems. Even later, in the 1990s, statistical relational learning, an approach that combines probability with logical formulas, allowed probability to be combined with first-order logic, e.g., with either Markov Logic Networks or Probabilistic Soft Logic.
Other, non-probabilistic extensions to first-order logic to support were also tried. For example, non-monotonic reasoning could be used with truth maintenance systems. A truth maintenance system tracked assumptions and justifications for all inferences. It allowed inferences to be withdrawn when assumptions were found out to be incorrect or a contradiction was derived. Explanations could be provided for an inference by explaining which rules were applied to create it and then continuing through underlying inferences and rules all the way back to root assumptions. Lotfi Zadeh had introduced a different kind of extension to handle the representation of vagueness. For example, in deciding how "heavy" or "tall" a man is, there is frequently no clear "yes" or "no" answer, and a predicate for heavy or tall would instead return values between 0 and 1. Those values represented to what degree the predicates were true. His fuzzy logic further provided a means for propagating combinations of these values through logical formulas.
In contrast to the knowledge-intensive approach of Meta-DENDRAL, Ross Quinlan invented a domain-independent approach to statistical classification, decision tree learning, starting first with ID3 and then later extending its capabilities to C4.5. The decision trees created are glass box, interpretable classifiers, with human-interpretable classification rules.
Advances were made in understanding machine learning theory, too. Tom Mitchell introduced version space learning which describes learning as a search through a space of hypotheses, with upper, more general, and lower, more specific, boundaries encompassing all viable hypotheses consistent with the examples seen so far. More formally, Leslie Valiant introduced Probably Approximately Correct Learning (PAC Learning), a framework for the mathematical analysis of machine learning.
Symbolic machine learning encompassed more than learning by example. E.g., John Anderson provided a cognitive model of human learning where skill practice results in a compilation of rules from a declarative format to a procedural format with his ACT-R cognitive architecture. For example, a student might learn to apply "Supplementary angles are two angles whose measures sum 180 degrees" as several different procedural rules. E.g., one rule might say that if X and Y are supplementary and you know X, then Y will be 180 - X. He called his approach "knowledge compilation". ACT-R has been used successfully to model aspects of human cognition, such as learning and retention. ACT-R is also used in intelligent tutoring systems, called cognitive tutors, to successfully teach geometry, computer programming, and algebra to school children.
Inductive logic programming was another approach to learning that allowed logic programs to be synthesized from input-output examples. E.g., Ehud Shapiro's MIS (Model Inference System) could synthesize Prolog programs from examples. John R. Koza applied genetic algorithms to program synthesis to create genetic programming, which he used to synthesize LISP programs. Finally, Zohar Manna and Richard Waldinger provided a more general approach to program synthesis that synthesizes a functional program in the course of proving its specifications to be correct.
As an alternative to logic, Roger Schank introduced case-based reasoning (CBR). The CBR approach outlined in his book, Dynamic Memory, focuses first on remembering key problem-solving cases for future use and generalizing them where appropriate. When faced with a new problem, CBR retrieves the most similar previous case and adapts it to the specifics of the current problem. Another alternative to logic, genetic algorithms and genetic programming are based on an evolutionary model of learning, where sets of rules are encoded into populations, the rules govern the behavior of individuals, and selection of the fittest prunes out sets of unsuitable rules over many generations.
Symbolic machine learning was applied to learning concepts, rules, heuristics, and problem-solving. Approaches, other than those above, include:
Henry Kautz, Francesca Rossi, and Bart Selman have also argued for a synthesis. Their arguments are based on a need to address the two kinds of thinking discussed in Daniel Kahneman's book, Thinking, Fast and Slow. Kahneman describes human thinking as having two components, System 1 and System 2. System 1 is fast, automatic, intuitive and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is the kind used for pattern recognition while System 2 is far better suited for planning, deduction, and deliberative thinking. In this view, deep learning best models the first kind of thinking while symbolic reasoning best models the second kind and both are needed.
Artur Garcez and Lamb describe research in this area as being ongoing for at least the past twenty years, dating from their 2002 book on neurosymbolic learning systems. A series of workshops on neuro-symbolic reasoning has been held every year since 2005.http://www.neural-symbolic.org
In their 2015 paper, Neural-Symbolic Learning and Reasoning: Contributions and Challenges, Garcez et al. argue that:
Approaches for integration are varied. Henry Kautz's taxonomy of neuro-symbolic architectures, along with some examples, follows:
Many key research questions remain, such as:
Other key innovations pioneered by LISP that have spread to other programming languages include:
In contrast to the US, in Europe the key AI programming language during that same period was Prolog. Prolog provided a built-in store of facts and clauses that could be queried by a read-eval-print loop. The store could act as a knowledge base and the clauses could act as rules or a restricted form of logic. As a subset of first-order logic Prolog was based on Horn clauses with a closed-world assumption—any facts not known were considered false—and a unique name assumption for primitive terms—e.g., the identifier barack_obama was considered to refer to exactly one object. Backtracking and unification are built-in to Prolog.
Alain Colmerauer and Philippe Roussel are credited as the inventors of Prolog. Prolog is a form of logic programming, which was invented by Robert Kowalski. Its history was also influenced by Carl Hewitt's PLANNER, an assertional database with pattern-directed invocation of methods. For more detail see the section on the origins of Prolog in the PLANNER article.
Prolog is also a kind of declarative programming. The logic clauses that describe programs are directly interpreted to run the programs specified. No explicit series of actions is required, as is the case with imperative programming languages.
Japan championed Prolog for its Fifth Generation Project, intending to build special hardware for high performance. Similarly, LISP machines were built to run LISP, but as the second AI boom turned to bust these companies could not compete with new workstations that could now run LISP or Prolog natively at comparable speeds. See the history section for more detail.
Smalltalk was another influential AI programming language. For example, it introduced metaclasses and, along with Flavors and CommonLoops, influenced the Common Lisp Object System, or (CLOS), that is now part of Common Lisp, the current standard Lisp dialect. CLOS is a Lisp-based object-oriented system that allows multiple inheritance, in addition to incremental extensions to both classes and metaclasses, thus providing a run-time meta-object protocol.
For other AI programming languages see this list of programming languages for artificial intelligence. Currently, Python, a multi-paradigm programming language, is the most popular programming language, partly due to its extensive package library that supports data science, natural language processing, and deep learning. Python includes a read-eval-print loop, functional elements such as higher-order functions, and object-oriented programming that includes metaclasses.
Description logic is a logic for automated classification of ontologies and for detecting inconsistent classification data. OWL is a language used to represent ontologies with description logic. Protégé is an ontology editor that can read in OWL ontologies and then check consistency with deductive classifiers such as such as HermiT.
First-order logic is more general than description logic. The automated theorem provers discussed below can prove theorems in first-order logic. Horn clause logic is more restricted than first-order logic and is used in logic programming languages such as Prolog. Extensions to first-order logic include temporal logic, to handle time; epistemic logic, to reason about agent knowledge; modal logic, to handle possibility and necessity; and probabilistic logics to handle logic and probability together.
Forward chaining inference engines are the most common, and are seen in CLIPS and OPS5. Backward chaining occurs in Prolog, where a more limited logical representation is used, Horn clause. Pattern-matching, specifically unification, is used in Prolog.
A more flexible kind of problem-solving occurs when reasoning about what to do next occurs, rather than simply choosing one of the available actions. This kind of meta-level reasoning is used in Soar and in the BB1 blackboard architecture.
Cognitive architectures such as ACT-R may have additional capabilities, such as the ability to compile frequently used knowledge into higher-level chunks.
Qualitative simulation, such as Benjamin Kuipers's QSIM, approximates human reasoning about naive physics, such as what happens when we heat a liquid in a pot on the stove. We expect it to heat and possibly boil over, even though we may not know its temperature, its boiling point, or other details, such as atmospheric pressure.
Similarly, Allen's temporal interval algebra is a simplification of reasoning about time and Region Connection Calculus is a simplification of reasoning about spatial relationships. Both can be solved with constraint solvers.
Parsing, tokenizing, spell checker, part-of-speech tagging, shallow parsing are all aspects of natural language processing long handled by symbolic AI, but since improved by deep learning approaches. In symbolic AI, discourse representation theory and first-order logic have been used to represent sentence meanings. Latent semantic analysis (LSA) and explicit semantic analysis also provided vector representations of documents. In the latter case, vector components are interpretable as concepts named by Wikipedia articles.
New deep learning approaches based on Transformer models have now eclipsed these earlier symbolic AI approaches and attained state-of-the-art performance in natural language processing. However, Transformer models are opaque and do not yet produce human-interpretable semantic representations for sentences and documents. Instead, they produce task-specific vectors where the meaning of the vector components is opaque.
In contrast, a multi-agent system consists of multiple agents that communicate amongst themselves with some inter-agent communication language such as Knowledge Query and Manipulation Language (KQML). The agents need not all have the same internal architecture. Advantages of multi-agent systems include the ability to divide work among the agents and to increase fault tolerance when agents are lost. Research problems include how agents reach consensus, distributed problem solving, multi-agent learning, multi-agent planning, and distributed constraint optimization.
McCarthy and Hayes introduced the Frame problem in 1969 in the paper, "Some Philosophical Problems from the Standpoint of Artificial Intelligence." A simple example occurs in "proving that one person could get into conversation with another", as an axiom asserting "if a person has a telephone he still has it after looking up a number in the telephone book" would be required for the deduction to succeed. Similar axioms would be required for other domain actions to specify what did not change.
A similar problem, called the Qualification Problem, occurs in trying to enumerate the preconditions for an action to succeed. An infinite number of pathological conditions can be imagined, e.g., a banana in a tailpipe could prevent a car from operating correctly.
McCarthy's approach to fix the frame problem was circumscription, a kind of non-monotonic logic where deductions could be made from actions that need only specify what would change while not having to explicitly specify everything that would not change. Other non-monotonic logics provided truth maintenance systems that revised beliefs leading to contradictions.
Other ways of handling more open-ended domains included probabilistic reasoning systems and machine learning to learn new concepts and rules. McCarthy's Advice taker can be viewed as an inspiration here, as it could incorporate new knowledge provided by a human in the form of assertions or rules. For example, experimental symbolic machine learning systems explored the ability to take high-level natural language advice and to interpret it into domain-specific actionable rules.
Similar to the problems in handling dynamic domains, common-sense reasoning is also difficult to capture in formal reasoning. Examples of common-sense reasoning include implicit reasoning about how people think or general knowledge of day-to-day events, objects, and living creatures. This kind of knowledge is taken for granted and not viewed as noteworthy. Common-sense reasoning is an open area of research and challenging both for symbolic systems (e.g., Cyc has attempted to capture key parts of this knowledge over more than a decade) and neural systems (e.g., that do not know not to drive into cones or not to hit pedestrians walking a bicycle).
McCarthy viewed his Advice taker as having common-sense, but his definition of common-sense was different than the one above. He defined a program as having common sense " if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows."
Three philosophical positions have been outlined among connectionists:
Olazaran, in his sociological history of the controversies within the neural network community, described the moderate connectionism view as essentially compatible with current research in neuro-symbolic hybrids:
The third and last position I would like to examine here is what I call the moderate connectionist view, a more eclectic view of the current debate between connectionism and symbolic AI. One of the researchers who has elaborated this position most explicitly is Andy Clark, a philosopher from the School of Cognitive and Computing Sciences of the University of Sussex (Brighton, England). Clark defended hybrid (partly symbolic, partly connectionist) systems. He claimed that (at least) two kinds of theories are needed in order to study and model cognition. On the one hand, for some information-processing tasks (such as pattern recognition) connectionism has advantages over symbolic models. But on the other hand, for other cognitive processes (such as serial, deductive reasoning, and generative symbol manipulation processes) the symbolic paradigm offers adequate models, and not only "approximations" (contrary to what radical connectionists would claim).
Gary Marcus has claimed that the animus in the deep learning community against symbolic approaches now may be more sociological than philosophical:
To think that we can simply abandon symbol-manipulation is to suspend disbelief.According to Marcus, Geoffrey Hinton and his colleagues have been vehemently "anti-symbolic":And yet, for the most part, that's how most current AI proceeds. Geoffrey Hinton and many others have tried hard to banish symbols altogether. The deep learning hope—seemingly grounded not so much in science, but in a sort of historical grudge—is that intelligent behavior will emerge purely from the confluence of massive data and deep learning. Where classical computers and software solve tasks by defining sets of symbol-manipulating rules dedicated to particular jobs, such as editing a line in a word processor or performing a calculation in a spreadsheet, neural networks typically try to solve tasks by statistical approximation and learning from examples.
When deep learning reemerged in 2012, it was with a kind of take-no-prisoners attitude that has characterized most of the last decade. By 2015, his hostility toward all things symbols had fully crystallized. He gave a talk at an AI workshop at Stanford comparing symbols to aether, one of science's greatest mistakes....
Since then, his anti-symbolic campaign has only increased in intensity. In 2016, Yann LeCun, Yoshua Bengio, and Hinton wrote a manifesto for deep learning in one of science's most important journals, Nature. It closed with a direct attack on symbol manipulation, calling not for reconciliation but for outright replacement. Later, Hinton told a gathering of European Union leaders that investing any further money in symbol-manipulating approaches was "a huge mistake," likening it to investing in internal combustion engines in the era of electric cars.
Part of these disputes may be due to unclear terminology:
Turing award winner Judea Pearl offers a critique of machine learning which, unfortunately, conflates the terms machine learning and deep learning. Similarly, when Geoffrey Hinton refers to symbolic AI, the connotation of the term tends to be that of expert systems dispossessed of any ability to learn. The use of the terminology is in need of clarification. Machine learning is not confined to association rule mining, c.f. the body of work on symbolic ML and relational learning (the differences to deep learning being the choice of representation, localist logical rather than distributed, and the non-use of gradient descent). Equally, symbolic AI is not just about production rules written by hand. A proper definition of AI concerns knowledge representation and reasoning, autonomous multi-agent systems, planning and argumentation, as well as learning.It is worth noting that, from a theoretical perspective, the boundary of advantages between connectionist AI and symbolic AI may not be as clear-cut as it appears. For instance, Heng Zhang and his colleagues have proved that mainstream knowledge representation formalisms are recursively isomorphic, provided they are universal or have equivalent expressive power. This finding implies that there is no fundamental distinction between using symbolic or connectionist knowledge representation formalisms for the realization of artificial general intelligence (AGI). Moreover, the existence of recursive isomorphisms suggests that different technical approaches can draw insights from one another. From this perspective, it seems unnecessary to overemphasize the advantages of any single technical school; instead, mutual learning and integration may offer the most promising path toward the realization of AGI.
Rodney Brooks invented behavior-based robotics, one approach to embodied cognition. Nouvelle AI, another name for this approach, is viewed as an alternative to both symbolic AI and connectionist AI. His approach rejected representations, either symbolic or distributed, as not only unnecessary, but as detrimental. Instead, he created the subsumption architecture, a layered architecture for embodied agents. Each layer achieves a different purpose and must function in the real world. For example, the first robot he describes in Intelligence Without Representation, has three layers. The bottom layer interprets sonar sensors to avoid objects. The middle layer causes the robot to wander around when there are no obstacles. The top layer causes the robot to go to more distant places for further exploration. Each layer can temporarily inhibit or suppress a lower-level layer. He criticized AI researchers for defining AI problems for their systems, when: "There is no clean division between perception (abstraction) and reasoning in the real world." He called his robots "Creatures" and each layer was "composed of a fixed-topology network of simple finite state machines." In the Nouvelle AI approach, "First, it is vitally important to test the Creatures we build in the real world; i.e., in the same world that we humans inhabit. It is disastrous to fall into the temptation of testing them in a simplified world first, even with the best intentions of later transferring activity to an unsimplified world." His emphasis on real-world testing was in contrast to "Early work in AI concentrated on games, geometrical problems, symbolic algebra, theorem proving, and other formal systems" and the use of the blocks world in symbolic AI systems such as SHRDLU.
Hybrid AIs incorporating one or more of these approaches are currently viewed as the path forward. Russell and Norvig conclude that:
Overall, Hubert Dreyfus saw areas where AI did not have complete answers and said that Al is therefore impossible; we now see many of these same areas undergoing continued research and development leading to increased capability, not impossibility.
|
|